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ABSTRACT 

Results of individual evaluations of four 
classes of first-year medical students' performance in a 
communication and interviewing skills curriculum were studied using 
meta-analysis. Carkhuffs (1969) Standard indexes of Discrimination 
(DI) and Communication (CI) were administered before and after course 
completion. Enrollment in the four successive classes was 46, 43, 42, 
and 72. Data were analyzed using combined tests and measures of 
effect size. The basic conclusion was that this curriculum produced 
large gains on two standardized measures. It was found that the 
performance by an average student improved 1.37 standard deviation 
units on the DI and 2.55 units on the CI from pre- to posttesting. 
Larger, effects were associated with both earlier graduating classes 
and traditional students (versus students in a combined six-year 
bachelor of science/M.D. program). It is concluded that the 
curriculum may be compared vis-a-vis student performance measures 
from year to year, as well as over all the years it is offered. 
Combining and synthesizing these individual class findings permits 
greater generalizabili ty and confidence in the evaluation results of 
the program than^do individual results based on smaller sample sizes. 
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A Meta-Analytic Evaluation of an Interpersonal 

Skills Curriculum: Accumulating Evidence « 
Over Successive Occasions 

■ * 

Abstract 

Results of individual evaluations of four successive classes of medical students' 
performance in a communication and interviewing skills curriculum were quantitatively 
synthesized using combined tests and measures of effect size typically used in literary 
meta-analytic reviews. The basic conclusion was that this curriculum produced gains on 
two standardized measures that were large in magnitude. An average student improved 
1.37 standard deviation units on the^ Discrimination Index and 2.55 standard deviation units 
on the Communication Index from pre- to posttesting. This translates into performance by 
an average student on the posttests (i.e., 50th - percentile) equivalent to the 91.5th 
percentile on the Discrimination pretest and the 99.5th percentile on thfc Communication 
pretest. Larger r effects *were ^associated with both earlier graduating classes and 
traditional students (versus students in a combined six year B.S./M.D. program). Gender 
did not mediate the effect of the training. 
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A Meta-Analytic Evaluation of an Interpersonal Skills Curriculum: 
Accumulating Evidence over Successive Occasions 

Methodology typically used in meta-analysis to integrate the results of independent 
tests of the same hypothesis in reviews of research literature (Glass, 1976, 1978; Glass, 
McGaw & Smith, 1981) are also appropriate for program evaluation in certain situations. 
One such situation has been referred to as the "independent samples/similar subjects - 
succesive occasions" case (Wolf, 1982). While measures of student learning -at the 
conclusion of a course are helpful indexes of the impact of a program knd its relative 
strengths and weaknesses vis-a-vis student performance, results may vary from term to 
term or year to year. Making curricular decisions based on data from any one term/year 
may not necessarily be representative of the results for other terms/years. Accumulating 
evidence over successive presentations of a curriculum would likely provide a more stable 
and generalizable assessment of both the direction and magnitude of impact. 

Often the novelty and excitment of a new curricular effort generates increased 
, interest and motivation on the part of instructors that may influence both the success of 
the program and student performance. Instructors, content, and/or the characteristics bf 
the students may change from year to year. Systematically accumulating evidence over 
time would help control for such differences, as well as permit a comparison of 
differences on measures of student performance (given similar outcome measures across 
different occasions). Additionally, in programs in which small numbers of students 
participate at any given time, sample sizes may not be sufficiently large to make valid 
inferences. Using meta-analytic procedures can, help to mitigate this difficulty by 
synthesizing data over successive occasions. 

Goals of the nine-week course in interpersonal skills (interviewing and 
communication) evaluated in the present study included increasing students 1 (a) awareness 
of the physician's personal impact on his/her patient and the healing process and the 
reciprocity of the physician/patient relationship, (b) skill in establishing a trust 
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relationship with patients^ (c) skill in facilitating patient self -exploration and 
subsequently his/her own understanding of how the .patient is experiencing the problem, 
and (d) skill in providing information, reassurance, support, and direction for the patient 
(Engler et al, 1981). * 

These goals can be summarized as increasing the value students place on emotional 
proximity and sensitivity to barriers in com'munication. It was hoped that each student 
v^ould be able to respond accurately to both the feeling and meaning expressed by 
"patients" by the end of this program. Students were taught to discriminate among three 
classes of verbal behavior: initiating behavior from the physician's frame of reference, 
responding to the patient's experience (patient's frame of reference), and helping patients 
explore their own feelings (and for the physician to be aware of his/her own feelings). The 
differential application of the classes of verbalizations is demonstrated as- the basis of a 
reciprocal relationship in which the patient feels valued by the physician. This study was 

conducted to synthesize the results of the initial experiences of the first four years of this 

> 

new program. Thus, the purpose of this study was (a) to evaluate the effectiveness of an 
interpersonal skills course for first-year medical students by (b) using methods of meta- 
analysis (typically used in syntheses of research literature) to summarize results for four 
successive medical school classes. 

Methodology 

Instrumentation and Sample 

Carkhuff's (1969a) Standard Indexes of Discrimination (DI) and Communication (CI) 
were administered to first-year medical students as part of student evaluation in a nine- 
week course in interviewing and communication skills (Engler et al, 1981; Saltzman et al, 
1981). Scores on the Discrimination Index are determined by taking the average of the 
absolute difference between students and experts' ratings of 64 "typical" physician 
responses to 16 patient statements. The Communication Index requests students to 
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respond to these same 16 patient statements (i.e., open format), and is scored with a 9- 

point scale ranging from level 1, "feeling and meaning both absent or both inaccurate," to 

level 5, "accurate response to personalized feeling aftct^^er^prralized goal and accurate 

identification of initial step" in treatment (Carkhuff, 1969a). Carkhuff (1969b) suggested 

that "(a) final functioning at level 2.5 or above, or (b) training gains of three-fourths of a 

level or more were reasonable goals for a successful training program." 

Data were obtained from students who participated during the fir^t four years of the 

interpersonal skills curriculum. Sample sizes for the classes were 46 (graduation class of 

1981), 43 (1982), 42 (t983), apd 72 (class of 1984). Students completed the CI and DI 

before participating in^the interviewing and communication skills course and again a^fter 

V 

completing the" course. Characteristics of the participants also were examined as possible 

* 

Mediators of the effects of training. These included gender, entry status (Approximately 
two-thirds of each class are admitted into a combined six year B.S./M.D. program, while 
the remainder are traditional students), and graduation clasfc. 
Design and Analyses 

A pretest-posttest pre-experimental design was used to evaluate the efficacy of this 
training program for each class. Glass (1978, p. 356) noted that this design may be 
considered primitive yet "adequate if the treated group members 1 pretreatment status is a 
good estimate of their hypothetical post-treatment status in the absence of treatments 
This is an empirical question that can be examined to determine if maturation, pre-test 
sensitization or other threats to the validity of this design have in fact biased this 
estimate. Kraemer and Andrews (4982) noted that the effects resulting from a pre-post 
design would be equal to that from an experimental-control design only "if one has prior 
certainty of the absence of time effects and of placebo effects" (p. 407). In the present 
use of the pre-post design in this evaluation, several recent studies support the validity of 
this design. McPherson, Wolf, and Sachs (1983) found significant differences on the 
Carkhuff measure favoring a group which experienced skills training versus a group whjch 



0 



Accumulating Evidence b 

<* » 
received information didactically. The skills training group did improve significantly from 

pre- to posttesting, while the didactic group did not. Thus, no placebo effect resulting 
from mere exposure to this content was anticipated nor found. Similarly, in another study 
using the Carkhuff measures (McPherson, Knopp, Sachs ic Wolf, 1983), results of a 
randomized pretest-posttest experimental-control group design indicated significant 
improvement for the experimental versus the control group. Thus, time nor maturation, 
alone accounted for this effect and no spontaneous improvement was found nor 
anticipated. Both of these studies support the utility of the pretest-posttest design used 
in the present study. Indeed, Campbell (1982) indicated that the one group, pretest- 
posttest design has "now been elevated to a useful quasi-experimental Vor iroto- 
experimental design" in the planned revision of his classic work on research design 
(Campbell <5c Stanley, 1963). 

Results for each of the four independent classes experiencing the interpersonal skills 
curriculum were synthesized through the use of combined test (Fisher, 1932; Rosenthal, 
1978; Winer, 197t) and effect size analyses (Cohen, 1977; Glass, 1976, 1978; Hedges, 1982; 
Rosenthal <3c Rubin, 1982a, 1982b). Wilcoxan Matched-Pairs • Ranked-Signs Tests 
(Marasuilo £c McSweeney, 1977) and dependent t-tests were used to examine changes in 
student performance on each of the Carkhuff indexes for each independent class. 
Combined Probabilities 

Statistical methods available for combining the results of independent studies 
addressing a common research question range from various counting procedures to a 
variety of summation procedures involving either significance levels (probabilities or their 
logarithmic transformations) or raw or weighted test statistics such as t's or z's. These 
later procedures have become known as "combined tests" and were originally developed 
independently by R. A. Fisher (1932) and Karl Pearson (1933). 

While a variety of combined tests are available, the suggestion to select a combined 
test statistic consistent with the statistics used in the independent tests (Wolf, 1982) for 
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each class was followed in this study. Thus, the combined test offered by Win^r (1971) for 

summing t's was used to synthesize the dependent t-test results for each outcome measure 

for each class. The Winer procedure for combining independent test results\ comes 

directly from the sampling distribution of independent t-statistics in which the t-statistics 

> 

associated with each test are summed and divided by the square root of the sum of the 
degrees of freedom (df) associated with each t after each df has been divided by df-2. 
This is based on df/(df-2) being the variance of a t distribution, which is approximately 
normally distributed (N(0,1)) when df >_I0. This may be expressed in the form of 

/ » 

Vdf/(df-2) - (1) 

The Stouffer test (Stouffer, 1949; Mosteller <5c Bush, 1954; Rosenthal, 1978) for 
summing z's was used to synthesize results of the individual Wilcoxan analyses. It is 
similar to the Winer procedure with the exception that z f s instead of t f s are summed. The 
denominator then simplifies to the square root of the number of tests combined. This 
procedure is based on the sum of normal deviates being itself a normal deviat?, with the 
variance equal to the number of observation* (N) summed. The complete expression takes 
the form of 




(2) 



Fail-Safe N or File-Drawer Problem 

Rosenthal (1979) pointed out that published studies more often include results that 
are statistically significant than do unpublished studies. Thus, it is possible that results of 
the above combined tests may be biased in favor of significant probabilities resulting. 
This might occur if there were evaluation results for other classes buried away in 
file-drawers. It is possible to estimate the number of studies confirming the null 
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hypothesis that would be necessary to reverse the conclusion of a combined test that a 
significant effect or relationship exists. When the significance level is set so that ^=.05, 
this fail-safe N, as Cooper (1979) referred to.it, can be calculated using the following 
formula: * - 

. is - 0 *\ittrj (3) 

where* I z = sum of the individual z-tests (or t-tests when the Winer procedure is used), N = 
number of -Studies combined, and 1.645 is the normal value (z) for £=.05. If £=.01, then 
1.645 is replaced by 2.33. A large fail-safe N would suggest that we may place greater 
confidence in significant results of combined tests, as many additional studies with no 
effect would be needed to reverse the conclusion of significance. Conversely, a small 
fail-safe N would call into question the significance of obtained results. 
Effect Size Estimation' 

Statistical tests such as the combined procedures previously described provide a 
summary index of the statistical significance of the results pertaining to an hypothesis. 
They do not, however, provide any insight into the strength of the relationship or effect of 
interest. The desirability of accompanying combined tests with indexes of effect size has 
been noted by Rosenthal (1978). Glass 1 exposition and application of meta-analysis relies 
heavily on the use of measures of effect size that have been eloquently summarized by 
Cohen (1977). Cohen states, "Without intending any necessary implication of causality, it 
is convenient to use the phrase 'effect size 1 to mean 'the cfegree to which the phenomenon 
is present in the population', or 'the degree to which the null hypothesis is false'. 
Whatever the manner of representation of a phenomenon in a particular research in the 
present treatment, the null hypothesis always means that the effect size is zero" (pp. 9- 
10). 

The goal is to obtain "a pure number, one free of our original measurement unit, 
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with which to index what can be alternatively cajled the degree of departure from the null 
hypothesis of the alternative hypothesis, or the ES (effect size) we wish to detect. This is 
accomplished by standardizing the raw effect size as expressed in the measurement unit 
of the dependent variable by dividing it by the (cojnmop) standard deviation of the 
measures in their respective populations, the latter also in the original measurement" 
(Cohen, 1977, p. 20). This may be accomplished in the form of 

A- 

d= )*l -*2 I (,) 

a 

where d = ES index for t-tests of means in standard unit, Xj and ^ = sample means in 
original measurement units, and o = standard deviation of either sample (as homogeneity 
of variance is assumed). The means, x j and Xj, are typically the experimental and control 
group means in posttest-only control group experimental designs, or pre- and post means 
in one group pretest-posttest pre-experimental designs, as used in this study. 

Once the effect size, d, is determined, Cohen provides tables to translate d into 
measures of nonoverlap (U) between the two groups, which translate rather nicely into 
graphical displays which facilitate interpretation of the results. Perhaps the most useful 
index of nonoverlap is Cohen's U^, which translates average performance in percentiles 
(area under the normal curve) of the posttest (or experimental) group to the equivalent 
percentile of the pretest (or control) group. 

Results and Discussion 
Data were analyzed separately for each of the two criterion measures. This is 
consistent with some meta-analytic studies (e.g., Kulik, Kulik tc Cohen, 1979; Mazzuca, 
1982) but inconsistent with those that have combined all outcome measures in one analysis 
(e.g., Smith tt Glass, 1977). The former approach was taken in that more precise 

hi . 
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information would be available for future curricular planning than if results for the two 

> 

Carkhuff measures had been combined. It is possible that the training program may have 
influenced performance on the two measures differentially. This would be obscured in one 
larger analysis where effects might even cancel each other. 
Overall Combined Results and Effect Sizes 

Results for each of the four classes on the Carkhuff Discrimination Index are 
summarized in Table I. Significant (p < .00 1 ) gains in performance from pre to posttesting 
were exhibited by each class, with paired t-tests ranging between 7.14 and 10.53. Results 
of the Winer combined test supported the research hypothesis of a significant gain in 
Discrimination performance (i.e., more accurate discrimination) when the scope of the 
inference is with respect to the combined populations (Z c =I7.85). The probability of 
obtaining this value of z or one larger is £ (Z >J7.85) <.00l, one-tailed. 



Insert Table 1 about here 

: -\ 

Wilcoxan matched-pairs analyses reported in Table 2 were consistent with the paired 
t-test and Winer combined test results. A significant number of students in each class 
exhibited significant improvement (p< .001) with z's ranging between -4.76 and -6.75. The 
Fisher combined test also supported the research hypothesis, with the probability of 
obtaining this z value or one smaller being £ (Z c £-11.22) <.00I, one-tailed. Only 18 of 
204 students across all four classes failed to improve on the Discrimination Index. 



Insert Table 2 about here 
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Effect sizes in Table 1 ranged between 1.22 and 1.50 standard deviation units, with 
an average effect size of 1.37 (SD=.13). Cohen (1977) provides interpretative guidelines 
for effect size, with d=.2 indicative of a small effect, d=.5 indicative of a medium effect, 
and d=.8 indicative of a large effect. Each of the individual class effects, as well as the 
average tffect, may be considered large in magnitude. Translating these effect sizes d 
into measures' of overlap (U) is accomplished by referring to tables in Cohen's text (1977). 
Alternatively, a normal distribution table may be/used as these values are equivalent to 
Cohen's tabled values. An average d value of 1.37 translates into a U-j value of % 9 1 5. 
This means that the average score (50th percentile) on the posttest jvas equivalent to the 
91.5th percentile on the pretest. This is depicted graphically in Figure 1. 



Insert Figure 1 about here 



Results for performance on the Communication Index are summarized in Table 3. 
Students in each class again exhibited significant improvement (p< .001) with paired t- 
tests ranging between -8.55 and -24.18. Results of the Winer combined test were also 
significant and indicated the probability of obtaining this value of Z c (-28.51) or one 
smaller is p < *001> one-tailed. Results of the Wilcoxan tests summarized in Table 4 
supported these results, as only 12 of 205 students failed to improve on , the 
Communication Index. Results of the Stouffer combined test likewise supported the 

o 

research hypothesis (Z c =-11.03; p< .001^ one-tailed). 



Insert Tables 3 and 4 about here 
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* 'Effect sizes for the individual classes ranged between l.*8 and 3.52 standard 

deviation units, with an average effect size of 2.55 (SD=.81). Translating this Average 

effect size (d) of 2,55 into the U- measure of non-overlip indicated that the average 

- ■ * • ■■ . ) 1 

score (50th percentile) on the Communication posttest was equivalent to the 99.5th 

percentile on the pretest. The average student could expect to improve 2.55 standard 

deviation units, as a result of this program. These resu are depicted graphically in 

Figure 2. 



Insert Figure 2 about here 



Fail-Safe N Results r 

For results pertaining to the Discrimination Index, the number of tests supporting 
the null hypothesis necessary to reverse the findings reported above (i.e., to find a 
combined \est result of £ >.o5Twas *88 if t-tests were used, or 182 studies if the more 
conservative Wilcoxan tests were used. For the Communication Index, 1,2*5 additional 
studies with null results would be needed to^reverse the conclusion of a significant effect 
in the t-test analyses* Approximately 180 null results would be needed to reverse the 
Wilcoxan findings. Thus, the findings reported here appear to be robust and well above 
Rosenthal's* (1979) "tolerance level" for null effects. 
Analyses of Mediating Effects 

Hypothesis tests were categorized according to the potential mediators of the 
effects of the interpersonal skills curriculum. Effect sizes were computed for each 
subgrouping of gender, entry status, and graduation year. Correlations were then 
computed between each potential mediator and the effect sizes for the pertinent 
subgroupings. Average effect sizes for the subgroupings are summarized in Table 5, 
except for graduation year which is included in Tables 1 and 3. 

V,, 
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Gender . The averag^effect sizes on the Communication Index for males was 2.25 
(SD=.80) and for females it was 2.06 (SD=1.*7). The point-biserial correlation between 
gender and ES using ESs for individual' classes for^ males artd females was .18 (n.s.j n=6). 
Average effect sizes on the Discrimination Index for males was l.*0 (SD=.29) and for 
females it was 1.21 (SD=.65). Again the correlation was non-significant (r pbi =-08; n=6). 
Thus, gender does not appear to mediate the effect of the program, as no differences of 
significance between males and females were found. 

Entry Status . Average effect Sizes on the Communication Index for students in the 
combined six-year B.S./M.D. program and for traditional students were 1.91 (SD=1.03) and 
3.10 (SD=1.33) standard 'deviation units, respectively. Average effect sizes on the 
Discrimination Index were 1.12 (SD=.16) for B.S./M.D. students and 1.64 (SD^=.12) for, 
traditional stents. M Results of point-biserial correlational analyses indicated that the 
effects of the curriculum were greater for traditional students on both the 
Communication (r pbi =.*5;^n=6; £<.10, *wo-tailed) and Discrimination (r pbi =.88; n=6, 
£ <.05) Indexes. Further examination of pre- and posttest average scores indicated that 
this finding is the result of the B.S./M.D. students entering the program with better skills. 
The training program acted as a "leveler", as there were no differences between B.S./M.D. 
and traditional students at posttesting (independent t-tests ranged between -0.92 and 1.19, 
n.s.). On the pretest, however, B.S./M.D. students in eadh class consistently performed 
significantly better than their traditional cohorts on the Communication Index 
(independents-tests ranged between 2.38 and 3.18, £ <.03) and on the Discrimination 
Index (class of 198* only). This is likely the result of a course the B.S./M.D. students 
receive prior to this interpersonal skills program. 

Graduating Class . Year of graduation and the effect size for each year, summarized 
for the Discrimination Index in Table 1 and for the Communication Index in Table 3, were 
correlated to t^st the research hypotheses of significant declines in performance for more 
recent classes. Correlations were -.91 (n=*; £< .05, one-tailed) for the Discrimination 
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Index and -.98 (n=*; £<.01, one-tailed) for the Communication Index. These results 

indicate almost a perfect negative relationship between recency of graduation and 

performance on the two Indexes. 

It is noteworthy, however, that the effects of the program for each of the classes 

may be considered large based oh Cohen's (1977) criteria. Each class attained Carkhuffs 

(1969b) two criteria for a successful training program, training gains of three-fourths of a 

level and final communication scores of 2.5 or above. There are several plausible rival 

explanations for this decreasing trend between effect size and recency of year of 

graduation. Because the class of 1981 was the first graduating class at this new medical 

school, perhaps greater care was taken>in selection procedures. Thus, admission policy 

may have changed over the course of these four years. Larger class sizes may be a 

factor. However, because this course is taught in small groups of 10-12 students, this 

would most likely not be an influence unless the addition of new instructors affected the 

quality of instruction. Finally, some changes in th<^eyrriculum may have occured over 

this period. Thus, an exmaination of the stability and change in (a) admission ^tandafrds, 

r « 

(b) class size, (c) instructors or their perfoVmance, (d) curriculum, and (e)^other student 
characteristics is necessary to more fully understand the meaning of the relationship 
between graduation year and effect size. 

Conclusions 

Even, though student performance on the CI and DI increased significantly for" each 
of the medical school classes, the magnitude of the effect varied. Using the Winer and 

Stouffer combined tests and Cohen's measure of effect size provide more stable summary 

i 

i 

indexes of impact of the course on student performance. -As each successive class 
completes the course, data may be. added to previous years 1 results and the Winer, 
Stouffer, and Cohen statistics mSiy be recalculated. Thus, the curriculum may be 
compared vis-a-vis these student performance measures from year to year, as well as 
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over all the years it is offered. Combining and synthesizing these individual class findings 
permits greater generalizability and confidence in the evaluation results of the program 
than do individual results based upon smaller sample sizes. As data accumulate, trends in 

V 

student performance may be noted that have implications for curricular planning anc| 

I 

development. Two such mediating relationships were found in the present study. First,', 
tranditional students' learning gains as evidenced on the Carkhuff measures were superior 
to gains of the six year combined B.S./M.D. students. This was a result of B.S./M.D. 
students entering the program with greater skills, most likely as a result of educational 
training earlier in their program that is related to the content of this course. Secondly, 
the effect of training, while significant and large in magnitude for each class, appears to 
be* declining with each succesive class. This trend merits closer examination and 
understanding. 

Evidence was cited from other studies to support the validity of the findings of the. 

pretest-posttest design used in the present study. It is unlikely th^t the large effects ofj 

this curriculum were the result of maturation, time, spontaneous improvement, or a 

placebo effect of merely attending didactic sessions. Over 180 additional studies with no 

effect would be necessary to reverse the conclusion of a significant effect of training. 

Clearly, the program did have a significant positive impact on first year medical students' 

communication and interviewing skills. Whether these skills become more fully 

developed, refined, and eventually used in interactions with patients in the future are 

important issues that merit examination. 

V 
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Table 1 



Means, Standard Deviations, Paired t-Tests and Effect Sizes for Medical 
Student Performance on Pre and Post Standard Indexes of Discrimination 



Graduation Year 


n 


Pre 

M 


Sd 


Post 

M 


"Sd 


Paired 
t 


d 


u 3 (%) 


1981 


46 


.99 


.23 


.65 


.17 


8.95* 


1.48 


93.1 


1982- 


43 


.95 


.20 


.65 


.15 


9.86* 


1.50 


93.3 


1983 


42 


1.00 


.23 


.71 


.16 


7.14* 


1.26 


89.6 


198* 


73 


1.04 


.27 


.71 


.18 


10.53* 


1.22 


88.9 


Average 














1.37 


91.5 



*£ < .001, two-tailed test 
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Table 2 



Wilcoxon Matched - Pairs Test for Change in Performance 
on Standard Index of Discrimination 



Number ol Students 

Graduation Year n Declined Same Improved z 

1981 46 3 0 43 -5.34* 

1982 43 2 0 41 -5.58* 

1983 42 ' 4 0 38 -4.76* 

1984 73 9 "0 64 -6.75* 

*£ <.001, two- tailed test 
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Table 3 



Means, Standard Deviations, Paired t-Tests and Effect Sizes for Medical 
Student Performance on Pre and Post Standard Indexes of Communication 



Pre 



Post 



Graduation Year 


n 


- M 


Sd 


M 


Sd 


t. 


d 


u 3 (%) 


1981 


46 


1.55 - 


.30 


2.60 


.22. 


-24.18* 


3.52 


99.9 


1982 


44 


1.32 


.39 


2.54 


-.48 


-14.16* 


3.12 


99.9 


1983 


42 


1.47 


.52 


2.55 


.59 


-8.55* 


2.07 


98.0 


1984" 


73 


1.73 


.50 


2.47 


.29 


-11.28* 


1,48 


93.1 


Average 




4 








) 


2.55 


99.5 



*£< .001, two-tailed test, 
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Table 0 



Wilcoxon Matphed - Pairs Test for Change in Performance 
on Standard Index of Communication 



Number of Students 

Graduation Year n declined Same Improved z 

V 

1981 46 0 - 0 46 -5.91 

1982 44 ''' 1 1 42 -5.69* 

1983 42 2 1 39 -4.71* 

1984 73 7 3 63 -5.75* 

*p_ < .001, two-tailed te?t 
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Tabled 

Average Effect Sizes for Subgroupings of Study Characteristics 
for Communication and Discrimination Indexes 

Communication Discrimination 
Characteristics , x" d SD d x d > SD d N 



Gender 



Males • 2,25 .80 1.40 .29 3 

Females 2.06 1.47 1.21 .65 . 3 



Entry Status 



Combined B.S./M.D. 1.91 1.03 1.12" .16 3 

Traditional 3.10 1.33 , 1.64 .12 3 

Note: N is the number of studies on which the average effect size (x" d ) and SD d are based. 



PRETEST I 
DISTRIBUTION 



POSTTEST 
DISTRIBUTION 



* . . " • " 50th 91 .5th 

' PERCENTILE OF PRETEST DISTRIBUTION 

figure i. Average effect size* in standard deviation units (° x ) 
of medical student performance on the Discrimination Index. 



2.55a x 




